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If S is an infinite sequence over a finite alphabet E and j8 is a probability measure on E, then the 
dimension of S with respect to j3, written dim'^ (S), is a constructive version of Billingsley dimension 
that coincides with the (constructive Hausdorff) dimension dim(5') when J3 is the uniform probability 
measure. This paper shows that dim'^ (5) and its dual Dim'^(5'), the strong dimension of S with 
respect to j3, can be used in conjunction with randomness to measure the similarity of two probability 
measures a and j3 on E. Specifically, we prove that the divergence formula 



diml^ (R) = Dim^^ (R) 



yfia) + &{a\\p) 



holds whenever a and j5 are computable, positive probability measures on E and /? G E°° is random 
with respect to a. In this formula, J^{a) is the Shannon entropy of a, and f^(a| |/3) is the Kullback- 
Leibler divergence between a and j3 . 



1 Introduction 

The constructive dimension dim(5') and the constructive strong dimension Dim(5') of an infinite sequence 
S over a finite alphabet £ are constructive versions of the two most important classical fractal dimensions, 
namely, Hausdorff dimension |71 and packing dimension |[20l[T9l . respectively. These two constructive 
dimensions, which were introduced in ifTTl fTI. have been shown to have the useful characterizations 

K(w) 

dim(S) =liminf , ^, (1.1) 

w^s |w|iog|i;| 

and 

K(w) 

Dim(^) =limsup , , , , , (1.2) 

w^s k|iog|s| 

where the logarithm is base-2 ifTSl fTI. In these equations, K(w) is the Kolmogorov complexity of the pre- 
fix w of S, i.e., the length in bits of the shortest program that prints the string w. (See [^1 for details.) The 
numerators in these equations are thus the algorithmic information content of w, while the denominators 
are the "naive" information content of w, also in bits. We thus understand (II. II ) and (11.21 ) to say that 
dim(5') and Dim(5) are the lower and upper information densities of the sequence S. These constructive 
dimensions and their analogs at other levels of effectivity have been investigated extensively in recent 
years HI. 
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The constructive dimensions dim (5) and Dim (5) have recently been generalized to incorporate a 
probability measure v on the sequence space £°° as a parameter (13^. Specifically, for each such v and 
each sequence S we now have the constructive dimension dim^(5') and the constructive strong 

dimension Dim^(S) of S with respect to v. (The first of these is a constructive version of Billingsley 
dimension Q.) When v is the uniform probability measure on we have dim^(5') = dim(5) and 
Dim^(5') = Dim (5). A more interesting example occurs when v is the product measure generated by a 
nonuniform probability measure j3 on the alphabet £. In this case, dim^(5) and Dim^(5), which we write 
as dim^(S) and Dim'^(5'), are again the lower and upper information densities of S, but these densities 
are now measured with respect to unequal letter costs. Specifically, it was shown in ifTSl that 

dim'^(5) =liminf^^ (1.3) 



and 



where 



.5 J^piw) 

El 

.5 ^p{w)' 



DimP (5) = lim sup \ ' , (1.4) 



M- 



w 



Ilog- 



% "i8(w[/]) 

is the Shannon self-information of w with respect to j8. These unequal letter costs log(l/j8(a)) for a G £ 
can in fact be useful. For example, the complete analysis of the dimensions of individual points in 
self-similar fractals given by |[T3l requires these constructive dimensions with a particular choice of the 
probability measure j8 on £. 

In this paper we show how to use the constructive dimensions dim'^(S) and Dim^ {S) m conjunction 
with randomness to measure the degree to which two probability measures on £ are similar. To see why 
this might be possible, we note that the inequalities 

0<dim'^(5) <Dim^(5) < 1 

hold for all j3 and S and that the maximum values 

dim'^ {R) = Dira^ {R) = l (1-5) 

are achieved if (but not only if) the sequence R is random with respect to j3. It is thus reasonable to hope 
that, if R is random with respect to some other probability measure a on £, then dim'^(/?) and Dim^{R) 
will take on values whose closeness to 1 reflects the degree to which a is similar to j8. 
This is indeed the case. Our first main theorem says that the divergence formula 

holds whenever a and j8 are computable, positive probability measures on £ and /? € is random with 
respect to a. In this formula, (a) is the Shannon entropy of a, and ^(a||j3) is the KuUback-Leibler 
divergence between a and j3. When a = P, the Kullback-Leibler divergence ^(a||j8) is 0, so (11.61 ) 
coincides with (11.51 ). When a and jS are dissimilar, the Kullback-Leibler divergence ^(a||jS) is large, 
so the right-hand side of (11.61 ) is small. Hence the divergence formula tells us that, when R is a-random, 
is a quantity in [0, 1] whose closeness to 1 is an indicator of the similarity between 

a and j3. 
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dimgs(S) =infUminf \\/_ [ (1.7) 



dim^s(5) = inflimsup— — - (1.8) 



The proof of ( |1.6b serves as an outline of our other, more challenging task, which is to prove that the 
divergence formula ( 11.61 ) also holds for the much more effective finite-state ji -dimension dim^^{R) and 
finite-state strong p -dimension Dimp^{R). (These dimensions are generalizations of finite-state dimen- 
sion and finite-state sti'ong dimension, which were introduced in |l5l[ll, respectively.) 

With this objective in mind, our second main theorem characterizes the finite-state /3 -dimensions in 
terms of finite-state data compression. Specifically, this theorem says that, in analogy with (11.31 ) and 
(lOI) . the identities 

^e^-infUminfH!l:ll 

C w^S J^p{w) 

and 

hold for all infinite sequences 5 over £. The infima here are taken over all information-lossless finite- 
state compressors (a model introduced by Shannon [18] and investigated extensively ever since) C with 
output alphabet 0, 1, and |C(>v)| denotes the number of bits that C outputs when processing the prefix w 
of S. The special cases of (11.71 ) and (11.8b in which j3 is the uniform probability measure on Z, and hence 
J^j}{w) = |w|log|r|, were proven in 15] [H. In fact, our proof uses these special cases as "black boxes" 
from which we derive the more general (11.71 ) and (11.81) . 

With ( 11.71 ) and ( 11.81 ) in hand, we prove our third main theorem. This involves the finite-state version 
of randomness, which was introduced by Borel iQ long before finite-state automata were defined. If a 
is a probability measure on £, then a sequence 5 € r°° is a-normal in the sense of Borel if every finite 
string w G £* appears with asymptotic frequency a{w) in S, where we write 

M-i 

a{w) = Y[ a(w[/]). 

(=0 

Our third main theorem says that the divergence formula 

dimUR) = Dim|?,(7?) = (1.9) 
holds whenever a and j3 are positive probability measures on £ and /? € r°° is a-normal. 
Acknowledgments. I thank Xiaoyang Gu and Elvira Mayordomo for useful discussions. 
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